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Abstract 

We investigate network exploration by random walks defined via stationary and adaptive 
transition probabilities on large graphs. We derive an exact formula valid for arbitrary graphs 
and arbitrary walks with stationary transition probabilities (STP), for the average number of 
discovered edges as function of time. We show that for STP walks site and edge exploration 
obey the same scaling ~ n x as function of time n. Therefore, edge exploration on graphs with 
many loops is always lagging compared to site exploration, the revealed graph being sparse 
until almost all nodes have been discovered. We then introduce the Edge Explorer Model, 
which presents a novel class of adaptive walks, that perform faithful network discovery even 
on dense networks. 



1 Introduction 

Random walk theory [HI [21 EJ has seen myriad applications ranging from physics, biology and 
ecology through market models, finance, to problems in mathematics and computer science. It 
is being used to sample distributions, compute volumes and solve convex optimization problems 
Bl and has played a key role in www search engines [5]. It provides a microscopic description 
for real-world transport processes on networks, such as information spread 1U 13 and disease 
propagation (epidemics (ll|9j[T0l) and it can also be used to design network discovery/exploration 
tools ifTTTl . Here we focus on the latter aspect. 

The structure of real-world networks lfl2l [T3l is organically evolving and frequently, either 
their size is simply prohibitive for measuring their topology (the WWW has ~ 2 x 10 10 nodes), 
or the nature of the network makes it difficult to gather global information (e.g., in some social 
networks). Such networks are best explored by 'walkers' stepping from a node to a neighbor 
node linked by an edge, collecting local information, which is then combined to produce subgraph 

*E-mail: toro@nd.edu 
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samples of the original graph. To fix notations, we denote by G(V, E) the graph on which the 
walk happens, where V is the set of ./V nodes, E is the set of M edges, and by p(s'\s;t) the 
single-step transition probability of a walker at site (node) s to step onto a neighboring site s' 
((s, s') G E), on the i-th step. Note that equivalently, one could consider the walk taking place 
on the complete graph K^, with setting p(s'\s; t) = for (s', s) E. One can think of p(s'\s; t) 
as information 'handed' to the walker at node s to follow in its choice for stepping to the next 
node. Accordingly, an important optimization problem is to 'design' the p(s'\s;t) probabilities 
such that certain properties of the exploration are optimal. Such problems motivate the devel- 
opment of the statistical mechanics of network discovery, connecting the set of local transition 
probabilities {p(s'\s; t)} with the global properties of the uncovered subgraph as function of time. 
We distinguish two main classes of exploration problems, namely those with: I. stationary transi- 
tion probabilities (STP) where p(s'\s; t) = p(s'\s) (time-independent) and II. adaptive transition 
probabilities (ATP) where p(s'\s; t) depends on time and possibly on the past history of the walk. 
For general STP walks, analytic results were obtained for the number S n of distinct (virgin) nodes 
visited in n steps (site exploration) |[T4l [131 [T6l (for a review see fl]]), and on the cover time T y 
(expected number of steps to visit all nodes, for a review see ifrTTD . Numerically, site exploration 
by simple random walks p(s'\s) = k~ l (k s is the degree of node s) has been extensively studied 
on various complex network models |[T8l[T9l . 

Interestingly, the number X n of distinct edges (edge exploration) visited in n-steps has only 
been studied numerically, for simple random walks, ||20ll2T1l . and no analytic results similar to S n 
have been derived. The statistics of X n , however, cannot be obtained directly from the analytic 
results for S n on a "dual" graph, such as the edge-to- vertex dual graph L{G), because the walk 
does not transform simply onto L(G). On graphs with loops, there is an inherent asymmetry 
between the evolution of S n and X n . While a new node is always discovered via a new edge 
(SVi+i = S n + 1 implies X n+ \ = X n + 1, Fig. ([lja)), a new edge can be discovered between two 
previously visited nodes as well (Fig. Qb)). In the latter case, the walker always encloses a loop 
in the discovered subgraph, hence the (S n , X n ) pair can be connected to the loop statistics of the 
network. More precisely, the quantity Q n = 1 + X n — S n gives the number of times the walker 
returned to its own path through a freshly discovered edge, in n steps. Clearly, if G is a tree, then 
X n = S n - 1 for all n > 0. 

In this Letter we provide an exact expression for the generating function for the average num- 
ber of discovered edges {X n ) in n-steps for arbitrary STP walks on arbitrary graphs. Although 
our expressions are valid in general, we are interested in the scaling behavior of (S n ) and (X n ) 
for large times (n S> 1) on large (N ^> 1) connected graphs. Let us consider a monotonically 
increasing sequence {c n } of positive terms c n > 0. We will use the notation c n ~ n K to say that 
c n scales with n with a growth exponent k, if as n — > oo, c n ~ n K L(n), where L(n) is a slowly 
varying function that is L(rjx) / L(x) — > 1 as x — > oo for any rj > 0. 

As it will be seen, for STP walks on large but finite graphs both the average number of discov- 
ered nodes and edges obey scaling laws (S n ) ~ n x , (X n ) ~ n^. These hold up to a cross-over 
time Ty (for (S n )) and Tg (for (X n )) after which saturation sets in until all the nodes (edges) 
have been discovered, at the corresponding cover times T v and T%. At the cross-over time only a 
small constant number of nodes (edges) are left untouched and we consider this as the time where 
the discovery has practically been completed. Since in a step at most one node (edge) can be 
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Figure 1: a) A new node (empty circle) is always discovered via a new edge (dashed line), b) A 
new edge, can also be discovered between already visited nodes, c) First passage time through an 
edge can be computed from the first passage time to a pseudonode z placed on the edge. 

discovered, the growth of S n (X n ) is at most linear, at any time, for any walk (STP or ATP), that 
is A < 1, fj, < 1. When the walker is in completely charted territory, both S n and X n stagnate, 
otherwise X n always grows (Fig. Qa-b) ), and hence fi > X. 

Here we show that for recurrent STP walks both site and edge exploration obey the same 
scaling that is A = /x in the N — > oo limit. This means that in dense graphs where the nr of 
edges M ~ N v , v > 1 (but u < 2), the node set V is discovered much earlier than the edge 
set E (Ty < Te)- As we prove below, even for the complete graph on N nodes G = K^, 
STP walks explore nodes and edges at the same rate, fi = A. This is counterintuitive, because 
there are 0(N 2 ) edges, and the walker could keep discovering many new edges between already 
visited nodes, so there is no obvious reason why we could not have fi > A. The fact that for STP 
walks, edge and site exploration grow at the same rate, presents a problem if one is interested in 
discovering the links (relationships) in a network. Moreover, if network discovery is done with the 
purpose of sampling and producing a subgraph with statistical properties resembling that of the 
underlying network, then STP walks will not provide the optimal solution, independently on the 
form of the transfer matrix p(s\s'). This is simply because a walker's choice to move to a neighbor 
will be independent of its visiting history, and therefore will have a lower chance on average to 
discover a virgin edge to a visited neighbor (Fig. [I]))) than for e.g., an ATP walk that is biased 
towards already visited neighbors. Hence, for a given number of visited nodes in an STP walk 
(X n ) = 0((S n }) number of edges will be revealed before Ty, making the discovered subgraph 
sparse, seriously skewing the sample especially, if the underlying network is dense (y > 1). To 
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resolve this, we introduce an ATP walk, the Edge Explorer Model (EEM) that performs a faithful 
exploration of the nodes and edges even on dense networks, such that ~ Ty. 

2 STP walks 

The generating function S{sq; £) = Y^=o(Sn)€ n f° r the average number of distinct sites discov- 
ered in n steps by the walker starting from site so can be written as HI : 

W(s;0 = [Pi^Or 1 , (2) 

where P(s|so;£) is the site occupation probability generating function, that is P(s|so;£) = 
Yl^=o £. n Pn{s\so), with p n {s\so) being the probability for a walker starting from so to be found 
at s on the n-th step. Next we derive a similar expression for X(sq\£) = Yl™=o{Xn)£ n - Let 
F n (s\so) be the first-passage time distribution (the probability for the walker to arrive at s for 
the first time on the n-th step) and let P(s|so; be its generating function. It is well known HI 
that F(s|srj;0 = [P(s|srj;£) — $s,s ] /P{s\s',Q- The probability that s is ever reached by the 
walker starting from so is therefore P(s|so) = Yl™=i Pn(s\so) = lim^j \£\<i F(s\sq;£). Since 
P(s|so) < I? F(s\sq; 1~) is convergent, however, P(s|so; 1~) can diverge. If P(s|so) = 1 for all 
s, so, then the walk is recurrent, P(s|so; l - ) = oo and P(s|so; 1~) = 1. Moreover, in this case 
P(s|so; 1~) has the same rate of divergence for all s, so £ V Q. For finite networks, in which 
the walker can access all nodes and there are no traps, the walk is recurrent. 

Let F n (e\so) denote the edge first-passage time distribution, i.e., the probability that the walker 
passes through edge e = (s, s') G E for the first time on the n-th step, given that it started at node 
so- By introducing an indicator r n (so) for the number of virgin edges discovered on the nth 
step (= 0, 1), we have (T n ) = Prob{r n = 1} = E ee£ F n (e|s ), with T = (T ) = 0. Clearly, 
X n (so) = z^J=o Tj(so) and thus the generating function for the average number of visited distinct 
edges in n steps becomes: 

x(s ; o = = t^7 E F ( e l*o; > (3) 

where T(so; £) and F(e\so;£) are generating functions for r n (so) and P n (e|so), respectively. 

To obtain the edge first-passage time distribution, we place an auxiliary site z on edge e = 
(s, s') G E (Fig[Tj;)) and redefine the walk on this new graph G z such that the node first-passage 
time probability to z on G z is the same as the edge first-passage time probability through e on G. 
The extended graph G Z (V Z , E z ) has V z = V U {z} and E z = {(s, z); (z, s')} UE \ {(s, a 7 )}. 
The addition of z to e = (s, s') changes only the transition probabilities around that edge, leaving 
p(r\r') the same away from s and s'. Steps from s (s') to s' (s) in the new walk are forbidden; 
instead, the walker has to step onto node z first. However, the same probability flow has to exist 
in the modified walk as in the original one when moving from sites s and s' towards z. From 
z the walker is only allowed to step to s or s' with arbitrary probabilities / and g, respectively 
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(which, however, should not enter the final expression for F(e\so; £)!). The single-step transition 
probabilities on the G z for the new walk can thus be combined into (r, r' G V z ): 

pt( r | r ') = (1 - S r > z - 5 rz - 5 r > s '5 rs - 5 r < s 5 rs i)p(r\r') + 
5 rz 5 r > s rp(s\s') + 5 rz S r ' s p(s'\s) + +5 r > z (fS rs + g5 rs i) . (4) 

The rest of the calculation focuses on obtaining the node first-passage time distributions Fn(z\s\) 
and site occupation probabilities Pn(r\s\) of the modified walk (r, s\ 6 V z ). Due to our setup 

we have F n (e\s ) = F&(z\s ), s E V, or F(e|s ; £) = F*(z\ao;0 = P*(z\*o]t)/PH z \*'>t)> 
so 7^ z. Obtaining the P* generating functions in terms of the original functions P involves a 
lengthy series of Green-function manipulations, using the formalism developed for 'taboo sites' 
HJ EH; see Appendix A. The final result after using Q is: 

X( So ;C) = ^-^2W(s;OP(s\s ;0 , (5) 

with 

w , ^ 1 + (d - c)K rfi » 

^ " 1 + (aa + dfjtf + (ad - 6c)a/3£ 2 ' ( j 

where a = a(£) = P(s|s';£), 6 = 6(f) = P(s|s;£), c = c(f) = P(s , |s , ;C), d = = 
P(s'|s; £) and a = p(s'|s), /3 = p(s\s'). The form ([5]) is similar to ([!]), however with a more in- 
volved weight function £W(s; £). This shows that edge exploration is usually quite different from 
node exploration. Expressions (|5]|6]> are entirely general, valid for any type of STP walk (includ- 
ing asymmetric walks p(s\s') ^ p(s'\s) ), on arbitrary graphs. The properties of £&(£)W( S >0 
fully determine the statistics of edge exploration, when compared to site exploration. Note that the 
summation in the expression of the weight W(s; £) is only over the network neighbors of s, due 
to the multiplicative transition probability a = p(s'\s), which is zero if s' and s are not neighbors 
in the graph. Next, we discuss some special cases for simple random walks. 

3 Special cases 

/. Simple random walks on K^. The single-step probabilities of a simple random walker can be 
written as p(s\s') = p(l — 6 SS >), p = 1/(N — 1). In this case 

p(s\ so ;o = [5 S , S0 - er 1 ] (i , 

and and -X~(£) are easily obtained. In particular, 

s(o = (i-o~ 1 (i+pe)[i-(i-p)^r 1 , 

from where, via contour integration, (S n ) = [1 +p — (1 — p) n ] jp . Similarly, 

W( S ;0 = [l + (a + b)pt}- 1 , 
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and since in this case £a = b — 1, we have 

W(s;Z) = [l- P +(l + Opbr 1 
From (|5]> and the expression for b = P(s\s; £) it follows: 

f 1 - £ + Np£ 



x(0 



1 - £ 1 + (2p - 1)£ + 2p(p - 1)£ 2 ' 
After contour integration we obtain the exact expression 



1+p 
2p 2 



+ 



Q1Q2 



i +pqi 



1 +P<?2 



qUh - 1) 9 2 n fe - 1} 



(V) 



(8) 



n > 0. Here > 1, qi < — 1 are the roots of the quadratic equation 2p(l —p)i 2 — (1 — 2p)£ — 1 = 
0. Fig. ([2])a) shows the agreement between simulations and the analytical formulas for (S n ), (X n ). 
From the above, for large graphs, p <C 1, (S n ) — l = p~ x [1 — (1 — p) n ] = n— (^)p+- • •> showing 
that (S n ) ~ n in the regime np <C 1, or n <C N. Similarly, for large graphs, q\ = l + 2p 2 + 0(p 3 ), 
q2 = — ^ — \ + 0(p), yielding (X n ) ~ (1 + p)(n — 1 + . . .) for n <C N 2 . The cover times can 
also be calculated, yielding T£ ~ NlnN (coinciding with El) and ~ N 2 IniV. Thus, both 
(S n ) and (X n ) grow together, linearly, (A = \x = 1) as function of time, with (S n ) saturating to 
N at T v and (X n ) to N(N - 1) /2 much later, at T E . Fig. Qb) shows the linear dependence of 
(X n ) on (S n ). 
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Figure 2: (a) Numerical (blue circles for red squares for (X n ), respectively) and analytical 
(black solid line) results shown for a complete graph of 500 nodes, (b) Same as a) with (X n ) vs 
(S n ), showing that the discovered graph is sparse even on K^. (S n ) — 1 is plotted instead of (S n ) 
in order to account for the initial conditions So = 1, Xo = 0. 

//. Infinite translationally invariant lattices in d dimensions. Simple random walks on such 
graphs are homogeneous, that is p(s\s') = p(l), P(s\s';^) = P(Z;£), where I = s — s'. It is 
known that P(l; = (27r)~ d J B dk e~ tkl [1 — ^uj(k)\~ l where the integration is over the first 
Brillouin zone B = [—it, Tr] d and co(k) = ^ p(l)e lkl is the structure function of the walk. While 
exact formulas are hard to obtain in d > 4, the leading order behavior of (S n ) and (X n ) can 
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be extracted from applying the discrete Tauberian theorem 0] on the corresponding generating 
functions. According to this theorem, the scaling c n ~ n K as n 3> 1 is equivalent to having the 
behavior C(£) ~ (1 - 0~ K ~ 1 L{1/{1 - £)) for the generating function C(£) = Y,n=o^ c n in 
the limit £ — > 1~. The results for (S n ) are also summarized in HI, we quote them here along with 
our results for (X n ) for comparison and completeness. For d = 1, (S n ) ~ \f^ripK , (X n ) ~ 
y/8n/ir. For d = 2, square lattice (S n ) ~ ^in) > (^™) ~ 37r+2Tn(8n) and for tne triangular 
lattice (S n ) ~ 7f g^ , <X n > ~ 5n+ ^Z(i2ny ^ d > 3 cubic (hypercubic) lattices, simple 
random walks are non-recurrent (transient), hence P(0; 1~) < oo and we obtain (S n ) 
(X n ^ - ■ Mn 



P(0;1")' 



2d-l+2P(0;l-) ■ 

These special cases suggest that for simple random walks A = fi, i.e., the edges are discovered 
mostly by visiting new nodes, and once the nodes have all been visited, the remaining edges are 
discovered, at the same rate. This holds for simple random walks on other networks as well, as 
indicated by our simulations summarized in Fig. ([3]). Simulations were run on Erdos-Renyi (ER) 
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Figure 3: Growth of (S n ) — 1 (blue circles) and (X n ) (red circles) for simple random walks. ER 
(N = 1000, (k) = 40); RGG (d = 2, N = 2000, (k) = 50); BA {N = 5000, (k) = 10); 
Hierarchical network (N = 3125, (k) = 8); Sierpinski gasket (N = 29526); GSHM fractal 
(N = 4810, (k) = 5.74). Results were averaged for 300 initial conditions on 200 different 
graphs. 

random graphs 11231 . random geometric graphs (RGG) [24J and the scale-free Barabasi-Albert 
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(BA) model ||25l , the hierarchical network model [26], the fractal Sierpinski gasket and the GSHM 
fractal network ll27l . The curves for (S n ) and (X n ) almost perfectly overlap, or run in parallel 
(A = p). In the case of ER, RGG and BA models the growth rates are linear. For the hierarchical 
network A ~ 0.94, p ~ 0.99, and for GSHM: A ~ 0.92 and p ~ 0.96. For the Sierpinski gasket 
A ~ 0.68 (same as in [28]) and p ~ 0.73. In all cases where deviations were observed for the 
exponents, they were small, on the order of p — A < 0.05. One can show that these deviations 
are due to correction terms which, while vanish in the N — > oo limit, they still show up in the 
simulations (which are on relatively small networks, to be able to observe the saturations). It is 
possible to prove that A = p in the N — > oo limit holds for general STP walks on arbitrary graphs, 
by showing that < b(l~)W(s; 1~) < oo, see Appendix B. 

4 Adaptive walks: a simple bound 

If STP walks are not good explorers, then naturally the question arises: What ATP walks would 
have good discovery properties? Due to time dependence, ATP walks present a much wider array 
of possibilities and their systematic treatment is a hard problem. Instead of tackling this general 
issue, here we first provide a simple upper bound for the mean edge discovery growth exponent, 
obeyed by any walk (ATP, or STP). Note that for ATP walks it is not necessarily true that (S n ) 
or (X n ) obeys scaling with a single exponent until saturation. However, due to the constraints 
from Fig. (jl])a-b), the 'local slopes' still obey 1 > d(X n )/dn > d(5 n )/dn > 0. Because slopes 
vary, we define the mean growth exponents p = lnM/lnT^ and A = \\xN/\nTy. The bound 
is based on the observation that the edge cross-over time cannot be smaller than the node cross- 
over time, Te > Ty. This provides the upper bound p < lnM/ln(Ty). At Ty, however, 
S n=Tv ~ ?y = N, and thus ln(Ty) = ± hxN. Recall that M ~ N v . Since the graph G is 
connected, 1 < v < 2. We therefore find that: 

\v > p > A . (9) 

Hence a necessary condition for ATP walks to achieve p > A mean growth exponents is v > 1. If 
v = 1, (sparse graphs), clearly no walk (ATP or STP) can achieve p > A. This is for example the 
case for all large networks with N — > oo that have fixed maximum degree D, as M < DN and 
thus v = 1. Inequalities (|9]> also show that the denser the graph, the larger the difference p — A 
could be, possibly obtained by sufficiently "smart" ATP walks. However, as v < 2, the mean edge 
discovery growth exponent can never be larger than twice that for nodes, i.e., 2A. 

5 The Edge Explorer Model (EEM) 

Next we introduce an ATP walk, the Edge Explorer Model, where the transition probability to step 
onto a neighboring site depends on the visitation history of that site and its neighbors. The EEM 
is one of the simplest models that performs enhanced graph discovery compared to STP walks, 
however, many other variants can be devised and fine tuned. 

The immediate neighbors of a site s can be divided into the set V vv of nodes that have already 
been visited and connected to s via visited edges, the set of nodes V vu that have been already 
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IK,, u K„ u K, 





V vu = 



Figure 4: The Edge Explorer Model, (a) If there are already discovered neighbors V vu of site s 
connecting through unvisited edges to s, the walker chooses one at random, uniformly, to move 
to. (b) If the visited neighbors are connected to s through visited edges only (V vu = 0), the walker 
chooses one uniformly at random from those that have an unvisited connection to another visited 
node. 



visited, but connected to s via unvisited edges and the set V u of unvisited nodes (Fig. Q). If 
V vu 7^ (Fig. Q a)), the walker steps to one of the nodes from this set, chosen uniformly at 
random. If, however, V vu = but V vv / (Fig. Q b)), the walker chooses a node uniformly 
at random among the nodes within V vv that have at least one connection via unvisited edges to 
other visited nodes. If no such nodes exist, then the walker chooses a node uniformly at random 
from V u . While in general ATP walks do not lend to analytical treatment, all the properties of 
the EEM model on the complete graph K]\f can be obtained exactly. Due to its rules, the walker 
always discovers at least one edge in two steps (only true on Kn !), which means that edge 
exploration happens linearly in time, \i = 1. Assuming that it has discovered m — 1 nodes, it 
will not discover a new node until it has discovered all the links amongst the m — 1 nodes, the 
discovered graph becoming K m _\. Then it adds the m-th node, discovering the remaining m — 1 
edges in 0{m) steps, thus finishing discovering all the nodes in 0(Y1 m ) = 0(N 2 ) steps. This 
means T v = 0(N 2 ) and therefore A = In N/ In T v = 1 /2. Since on K N the EEM walker cannot 
get lost in visited regions, the corresponding cross-over times and cover times are practically the 
same. Fig. ([5])a) shows (S n ) and (X n ) with these predicted features, including fi = 2A = 1. On 
if at, the edge exploration is optimal in the sense that for a given number of discovered nodes, it 
discovers the maximum number of edges possible up to that point, as shown in Fig. ([5]>b). Fig. 
([5]>c-d) show the same for EEM on ER graphs. As discussed above, the // — A slope difference 
increases with graph density, defined as p = p{G) = 2M/[N(N — 1)] < 1. On sparse graphs, 
however, the EEM can get trapped in visited regions if these regions are clusters/communities 
separated by bottlenecks from the rest of the graph. Within these regions the EEM walker performs 
a simple random walk before it escapes. For this reason, on low-density graphs the EEM is not 
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Figure 5: Site and edge exploration growth curves obtained for EEM on complete graphs ( Kn, 
N = 100, a) and b)), Erdos-Renyi graphs (ER, N = 1000, c) and d)). 



necessarily the optimal explorer. To illustrate the graph discovering fidelity of the EEM, in Fig. 
^ we compare the densities p n = 2(X n ) / (S n )((S n ) — 1) of the discovered graphs as function 
of time n, generated on the same network by the EEM and by the simple random walk (for 
and ER). Clearly, the simple random walk greatly undershoots the true graph density (indicated 
by horizontal red line), corresponding to (X n ) = 0((S n )), shown earlier, before it starts closing 
on the true value p(G); on the contrary, the EEM shows a systematic and rapid approach to p(G). 



6 Discussion 

In summary, we have investigated properties of network discovery by walkers that follow edges 
(also called crawlers) in the most general setting. We have derived an exact expression for the av- 
erage number of discovered edges (X n ) (its generating function) as function of time for arbitrary 
graphs and STP walks. In particular, we have shown that for STP walkers both edge and node dis- 
covery follow the same scaling law on large networks, independently on the form of the stationary 
transition probabilities. Hence, the discovered network will be sparse (the number of discovered 
edges scaling linearly with that of the discovered nodes), presenting a strongly skewed structure 
compared to the underlying network's if the latter is not sparse, v > 1. Only after a cross-over 
time ~ O(N), will the edges become increasingly discovered, which in the case of large networks 
means unfeasibly large wait times, eliminating STP walks as a useful methodology for faithful net- 
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Figure 6: Comparison of the performance in graph discovery by the EEM adaptive walk (green) 
and by the simple random walk (orange), on a) Kn and b) ER graphs, by plotting the time evo- 
lution of the discovered graph's density. The red line is the true density. For a) N = 100, for b) 
N = 10 3 , (k) = 40. 



work discovery. Our results thus rigorously show that efficient/faithful discovery can only be done 
with adaptive walkers, whom use time/history dependent information for their transition probabil- 
ities (ATP). Visiting history information can be thought of as "pheromone" trails on the network, 
which the walker uses through its rules for stepping onto the next site ll29ll . There is a plethora 
of possible rules using past history, however, to keep memory requirements low (bounded) on a 
walker, the desirable rules are the ones that only use information from the local neighborhood of 
the walker. In this vein, we have introduced a simplistic adaptive walk, the Edge Explorer Model, 
which is greedily biased towards already visited regions within a 2-step neighborhood. We have 
shown that on dense graphs the EEM performs near optimally or optimally (on K n ). 

This project was supported in part by the Army Research Laboratory, ARL Cooperative Agree- 
ment Number W91 1NF-09-2-0053, HDTRA 201473-35045 and NSF BCS-0826958. The content 
of this document are those of the authors and should not be interpreted as representing the offi- 
cial policies, either expressed or implied, of the ARL or the U.S. Government. The authors thank 
R.K.P Zia, B. Szymanski, M. Ercsey-Ravasz and S. Sreenivasan for useful discussions. 

A Derivation of X (sq , £) 

Consider a connected simple graph G(V, E) (there is a path along G"s edges between any two 
nodes), where V denotes the set of nodes and E the set of edges. In these pages we show the 
details of calculations for various exploration properties of (general) random walks on G, the only 
constraint being that the walk is restricted to move along existing edges (no long-range hops are 
allowed). We will rely heavily on the standard generating function technique CD [30] and for that 
reason we briefly introduce related definitions and basic results. In particular, if Aj is an arbitrary 
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series, then its generating function A(£) with |£| < 1 is defined as: 

oo 



n=0 



Knowing A(£), the elements of the series A n are recovered by inverting ( 10 1: 

r 

where F is counterclockwise contour around £ = 0. We are going to make use of the following 
expressions: 

X> = / ^rh^® • and f> = |^(£)^(n- (i2) 

j=o £ * s j=o f<i 



On many occasions, the inversion integral in (111 cannot be performed analytically. However, 
we are usually interested in the long-time limit n > 1 of the quantities, and for that we use 
the discrete Tauberian theorem which allows to estimate the leading order term, or the Darboux 
theorem (which can also produce terms beyond the leading order) [UIH. 

Let P n (s\so) denote the probability of the walker being at site s on the nth step given that it 
started from site so, and let F n (s\so) denote the probability of the walker visiting site s for the 
first time on the n-th step, given it started from sq. The corresponding generating functions are 
P(s|s ;0 andF(s|s ;0 (Note that F(s|s ; £) = J2^=iC n Fn{s\s )). Partition over the last step 
and partition over the first step give two useful recursion relations (r, sq G V): 

P n+ i(r\s ) = ^2 p(r\r')P n (r'\s ), P n+1 (r\s ) = ^ P n (r\r')p(r'\s ) ■ (13) 

r'eV r'eV 

In terms of generating functions: 

P(r\s 0] = 5 rso + £ Yl P(r\r')P(r'\s ; £) , (14) 
r'ev 

P(r\s ;0 = 6 rso P(r\r';0p(r'\s ) ■ (15) 

r'eV 

These identities can be used to derive a relationship between the site occupancy generating func- 
tion and the first-passage time generating function, valid for all connected graphs, and all walks 



In the following we derive Eqs (5), (6) of the main paper. Let e = (s, s') G E be an edge in 
G, and let F n (e|so) denote the probability that the walker passes through e for the first time on 
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the n-th step, given it commenced from so G V. Let us denote by X n (so) the number of distinct 
edges visited during an n-step walk that commenced from site so- We introduce the indicator 
r n (so) 6 {0, 1} for the number of virgin edges discovered on the n-th step: 

(r n (s )> = Prob{r„(s ) = 1} = Yl F n(e\s ) . (17) 
As convention we take To = (To) = 0. Thus we can write: 

n n 

X n (s )=^ r i( s o), (Xn(s )) = £(r> )> , (18) 

3=0 j=0 

The corresponding generating function is : 



oo oo n -p/ £\ 

n=0 n=0 j=0 ^ 



(19) 



where T(sq; £) is the generating function for the indicator and we used the first identity in ( 12 1 
From ( [17] ): 



oo 

r(s ; = EE CF n (e\s ) = £ F(e| So ; £)■ (20) 

e£En=l eS-E 

We need to calculate the edge first passage probabilities _F n (e|sn)> or their generating function 
F(e|so! £)■ Clearly, F n (e|so) contains all the paths commencing from so that never crossed edge 
e during the first n — 1 steps, but they do so on the n-th step. 

To calculate F(e\so;£), we introduce an auxiliary node z placed on the edge e, as described 
in the main paper (Fig. lc)), and consider the random walk on this extended graph G z . The node 
set of this graph is V z = V U {z} and the edge set E z = E\{(s, s')} U {(s, z), (z, s')}. Let 
Pn{r\s\) be the site occupation probability and Fn(z\si) be the corresponding first passage time 
distribution for the random walk on G z , and here r, s\ E V z . Then it certainly holds that the first 
passage probability through edge e on the G graph is identical to the site first passage probability 
of the new walk to site z on the G z graph: 

F n (e\s ) = F*(z\8 ) , s eV. (21) 

The corresponding generating function takes the form: 



T?( I C\ T?\( I C\ P ( Z \ S ° £) U 



(22) 



where we used the general identity ( 16 1 for the new walk on the new graph G z . 

In order to fully specify the new walk on G z we need to define the corresponding single-step 
transition probabilities. The single-step probabilities away from the nodes s and sf of the new 
walk are identical to the old walk's, including those of getting to s and to s' from nodes other 
than s' and s, respectively. When considering steps from one (s (s')) to the other (s' (s)), the 
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single-step probabilities in the new walk are forbidden. Instead the walker has to step onto site z 
first, before it can step further to the other site (s or s'). We also have to make sure that we have 
the same probability flow in the new walk as in the old one when moving from nodes s and s' 
towards z (in the old walk these would be along edge e). From node z the walker can only step 
to s or s' with probabilities / and g respectively. The probabilities / and g are arbitrary, and the 
independence of the final expression for F(e\so; £) on these two variables serve as a good test for 
the correctness of our calculations. Eq (4) of the main paper provides the condensed form of the 
single-step transition probabilities for the new walk on G z , and it combines the following cases: 

p\s\s') =p\s'\s) = (23) 

p'(s\z) = f, p'(s'\z) = g, p\r\z)=0 for r£V z \{s,s'} (24) 

p^ (z\s) = p(s\s'), (z\s) = p(s'\s), p\z\r') = for r' G V z \ {s, s'} (25) 

p\r\r') = p{r\r') + q(r\r'), q(r\r') = -(6 r > s >5 rs + 6 r , s 5 rs i)p(r\r'), r,r r £V. (26) 

The pseudo-node z uniquely characterizes the edge e = (s, s') G E. To remind us about this 



identification, we will write z instead of e, in the remainder. From (|22|) and (|20|) it follows that 

r(s ;C) 



5^ Pt(z\z;€) 



s £V 



(27) 



The sum is over all the edges of the original graph G. Thus we need to compute the site occupation 
probabilities for the new walk on the extended graph. The relationships based on partition over 
the last step ( 14 1 and first step ( fT5] ), hold for the new walk as well: 



P ] (r\ Sl ;i) = 8 rSl +i Y, 

p^(r\r')P^(r'\si; £) 



'eV z 



r, si G V z 
r, si G V z 



From ( |28) with r = z and si = so and using ( |25) one obtains: 

P\z\s ;O = &(s\s')P^s'\s ; + Zp(s'\s)P\s\s ; £) , s G V . 
Eq (29 1 with r = s\ = z and ( |24| ) yields: 

P\z\z- = 1 + UP\z\s- + igP\z\s'- . 



(28) 
(29) 

(30) 
(3D 



Since (30) is valid for any sq G V, we first make sq h> s and then the sq h-> s' substitutions to get 



PHz\s; = fr(s\s')P\s'\s; £) + &(s'\s)pi (s\s; £) 
P f (z|s';0 =Cp(s|s') i ' t (s / |s / ;0 + Cp(s / |s)P t (s|s / ;0 

Inserting these into pT) one obtains: 

Pt(^;e) = l + e 2 p(s|s') [/PVtoO+^Vl^O 
+e 2 p( S / | S ) [/Pt( S | S ;£)+ 5 Pt( s | s ';£)" 



(32) 
(33) 



(34) 
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Eq ( 34 1 shows that we need to calculate the site occupation probability generating function pt (r | so ; £) 
for r, so £ V. In order to derive P^(r\so; f° r r,so £ V we first make the replacements r' \-> r", 
r \- >• r', s\ \- > sq G V in (f29|) to obtain: 



iV|so;0 = <W> + £ 2 pVIO^VKjO , (35) 

With the help of ((26]) we then find: 

iVl*o;0 = <^ +£ E WV") + q(r'\r")}P\r"\s 0] + &(i'\z ] t)pi(z\8 ;0 

r"eV 

or after rearranging 

P^M-Z E pKIO^VkiO = s r , so + e E 9(^10^(^1*0; 

+ eP t (r'k;0-F' t («|so;0 ■ (36) 

After multiplying both sides with P(r|r';£), r G 1/ and summing both sides over r' G V, the 
above equation takes the form of 



J2p(r\r';OP\r'\s ;0 - E pt ( r '>o; £ E K^'lO^r'; £) = P(r\a ; £) + 

r'eV r"eV r'eV 

+ ^ P f (r>o;0 e E W;£Mr'|r") + 



r"eV r'eV 

+ tHfP(r\s;0 + gP(r\s';0}PHzW,0- (3V) 



Next we calculate the left hand side of ( |37] > by using ( [T5| ) to write: 

£ E P^'lO-PH 7 *'^) = P(r\r";£) -£„.// 

r'eV 



When this is inserted into the lhs of (37), the sums with dagger terms cancel and one just sim- 
ply obtains P< (r|srj;£). The sums on the right hand side can be written in a simpler form after 
introducing the notation: 

A(r\r"; f ) = £ ^ P(r\r'; 0l( r '\ r ")- (38) 

Thus, eq ( [37] ) assumes the expression: 

pt( r | So; £) = P( r | So; £) + £ [/P(r|s; + gP(r\s'; £)] PHz\*o',0 

+ £ A(r\r"; OP f (r"W, 0- (39) 

r"£V 



Using (26 1 we find: 

A(r\r"; £) = -£P(r\s; OpHO^V " ^M*'; Op^'IO^'s (40) 



15 



A. Asztalos and Z. Toroczkai 



Network Discovery by Generalized Random Walks 



Inserting this in ( |39| ) one obtains: 

pt( r | ao; £) = P(r|s ; - ^(Hs; O^K)^ - ^OV; Op(s'\s)pHs\s ; £) + 
+ Z[fP(r\s;0 + 9P(r\s';0]PHzW,0 , reV. (41) 
Replacing r t- > s and then r t- > s' in the equation above, yields: 



a n PHs\s ;O + ai 2 Pt( s '| So ; £) + ai 3 Pt(*|s ; = P(s\s ; 
a 2 iPHs\s ;0 + a 22 pt( s '| So; £) + a 23 pt(>| S o;6 = P( s '| So; £) 



(42) 



where: 



an = l + £p(s'|s)P(s|s';0 a 2 i = £p(s'|s)P(sV ; 

ai2 = &(s\s')P(s\s; £) a 22 = 1 + ^OP^Is; £) (43) 

013 = -e [fP(s\s; + s^K; 0] a 23 = -e LWk + gPW; 0] 



equation is just d30b: 



System (42i needs a third equation, to solve for {Pt(s|so; £)> ^( s 'l s o; £); -f^( z l s o; 01- The third 

(44) 
(45) 



a 3 iPHs\so;0 + a32PHs'\s ;0 + a 33 PHz\s ;0 = 



with: 



^31 = &{s'\s) , a 32 = £p(s\s') , a 33 = -1 
Thus, if we introduce the column vectors: 



pt 



(s,s',z\s ;0 



PHs\s ;Z) 
PHs'\s ;0 

P f (z\so;0 



[P] ( s ,s'\s ;0 = 



P(s\s ;0 

P(s'\s ;0 




(46) 



and denote by A the 3x3 matrix with elements defined above by (43 1 and (45 1, then the linear 
equation to be solved is simply: 



Pt 



(s,s',z\s ;0 = [P]( S ,s'\s ;0 . 



Assuming that A is invertible, the solution is 



P 



{ S ,s',z\s ;0 = A- 1 [P](s,s'\s ;0 



The matrix explicitly looks as follows: 
A = 



1 + £aa ifib -£(ga + fb) 
iac 1 + m -i{gc + fd) 
ia £P -1 



where we introduced the shorthand notations: 



a = p(s'\s), (3 = p(s\s') , 

a = a(0=P(s\ S ';0, b = 6(£) = P(s\s;£) , 

c = c(0 = P(s'\s'; 0, d = d(£) = P(s'\s; £) 



(47) 



(48) 



(49) 



(50) 
(51) 
(52) 
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The inverse of matrix A is just 
1 



A" 1 



D 



i - p(d - voa p(p - (u + tpgoz 

a(c — v£)£ — 1 — a(a — (v + taf^)^ 



(53) 



where the determinant is: 

D = —l — w£ + (ua + v/3 — ta/3)£ 2 + ta(3(f + g)f, 

and we introduced the notations: 

u = ga + fb u = a — b 
v = fd + gc v = d — c 

t = ad — be w = aa + d/3. 



(54) 



(55) 



The final solutions can be easily read from (48 ). Note that these are now expressed solely in terms 
of the generating functions for the site occupation probabilities of the old walk on the old graph 
G\ In particular for P^(z\sq\£) we get: 



P\z\s ; £) = -^(1 + v(30P(s\s ;0 - §(1 + uaOP(s'\s ;0. 



(56) 



In order to obtain the site occupation probability generating functions pt(s|s;£), pt(s'|s;£), 
P'(s|s';£) and P'(s'|s';£) as required by the r.h.s. of (34i we merely substitute s and s', re- 
spectively for so in the expressions for the solutions. We thus obtain: 

PHa\a;S)-- 



P\s'\s;0-- 
Inserting these into ([34 



+ ta£ - tag?) , P^W; £) = ~(c + taf?) 



PHz\z;0 



D 



{l + w£ + tap? 



Next, from Eqs. ((56]) (j59J) and <[27J» we find: 

a£[l + (d - c)PZ}P(s\s Q ;0 + P£[l + (a- b)a£]P(s'\ So ; 



r( So ;e)=E 



1 + {aa + d/3)£ + (ad - 6c)a/3£ 2 



(57) 
(58) 

(59) 
(60) 



Note that this expression is independent on the variables / and g, as anticipated! Now using ( 19 1, 
the generating function for the average number of discovered distinct edges becomes: 



2(1-0 



E 



a[l + (d- c)/3g]P(s|s ; Q + P[l + (a ~ b)atj]P(s'\s ; 
1 + (aa + d/3)£ + (ad - 6c)a/3£ 2 



(61) 
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Taking a closer look at this expression one observes that only those (s, s') pairs will contribute 
in the sum which are neighbors on G (since a and (3 are zero for transitions along non-edges). 
After adding the sum to itself then interchanging the dummy variables s, s' in one of the sums, we 
finally obtain eqs (5),(6) of the main paper: 

X(s ;0 = T ^J2 W ( s 'OP(^o;0 , (62) 

s 

with 

W(s-0 = Ya 1 + (d " c) ^ (63) 



B Properties of X(sq; £) 



Next we show a number of properties of the weight function W(s; £). Before we do that, however, 
we need to establish a number of fundamental inequalities involving the generating functions 
(51 1, (52). Let us use the temporary notation (where all quantities are understood implicitly in the 
£ -»• I - limit, £ G K): 

A = 1 + aa + d/3 + (ad - bc)aj3 (64) 



The denominator A of (63 1 appears in the numerator of P<(z\z; £) given by (59) and thus: 

A 
D ' 



PHz\z-0 



(65) 



Since the pt-s in (57 59 1 are all generating functions for random walk probabilities, (that is, they 
are power-series with positive coefficients) they are all positive (for £ G M, £ — > l - ). Hence if we 
show that the determinant D < 0, then from the positivity of P^(z\z;^) it follows that A > 0. 
From (54) in the £ — > 1~ limit: — D = 1 + w — (ua + v(3), where we used the normalization 
/ + g = 1. Let us examine the sign of this expression. Using the definitions from (55 ), we obtain: 

- D = 1 + aa(l -g) + d/3(l - /) - afb - f3gc. (66) 

Recall, that f+g = 1, with / and g being probabilities chosen arbitrarily. Then D can be rewritten 
as 

— D = 1 + fa(a — 6) + gf3(d — c). (67) 
Determinant D also appears in the expressions of the site occupancy generating functions P^(s|s; £) 



and Pt(s|s';£),in(57l. In the £ 



1 limit, these expressions take the form of 
1 



P\s\s; r 

p\s\s'-,r 



D 

1 

~D 



(b + gpt), 
[a + gpt). 



The sum cP'(s\s; 1 ) + dP^(s|s'; 1 ) is always non-negative (c, d > 0), implying 

1 



cP Jf (s\s]l-) + dP\s\s';l-) 



D 



[ad + bc + tPg(c + d)} > 0. 



(68) 
(69) 

(70) 
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As a,b,c,d > (for £ — )• 1~, £ G K), < fi < 1, and g G [0,1] is an arbitrarily chosen 
transition probability, for any given £, <? can always be chosen to be small enough, such that 
ad + bc + tfig{c + d) > 0, independently on the sign of t. (Note that a, b, c, d are independent on 
g or /). Thus, for small g values, D < 0, and based on ( [65] ) this implies that A > 0. Since A is 
independent of f and g, this also implies that for arbitrary / (g = 1 — /), we have: 

-D = 1 + /a(a- b) +gp(d- c) > 0, as £^ 1" (71) 
A = 1 + (oa + dp) + {ad - bc)a/3 > 0, as £ ^ 1" (72) 

Hence, if we choose / = 1, g = in ( |7T[ ), followed by / = 0, g = 1 we obtain: 

1 + a(a - b) > 0, (73) 
1 + P(d - c) > 0, (74) 



as £ — > 1 . Observe that (74i is also obtained by switching s, s', to s' and s respectively, in 



(73 1 (which holds for any s and s'). Using ( 16 1, and the fact that F(s|so! 0^1 even at l£l = 1 
(F(s|so; 1) = i?(s|so) is the probability that s is ever reached from sq), after replacing so by 
s' ytz s , we find that in the limit £ — > 1~ 

a - b < . (75) 



However, although a < b, the difference a — b cannot be arbitrarily negative, as shown in (73): 
a — b > — a" 1 . Similarly, it holds: 

d - c < . (76) 

An immediate consequence of these inequalities, is that in the case of recurrent walks, where both 
a and b (and similarly, c and d) diverge as £ — > 1~, their difference is nevertheless bounded. From 



(74 1 and (72) it follows that every term in the expression of W(s; 1 ) is non-negative, and thus 
W(s;l~) > 0. 

Next we show that W(s; 1~) < oo (finite). Recall, that 1 < X n < n, that is, the number 
of edges cannot grow faster than linearly with time (at most one new edge can be discovered in a 
time step). This inequality written in terms of generating functions becomes: 

' < X{£) < - ^ , £ G M, £ > . (77) 



Let us assume that on the contrary, W(s; £) — > oo as £ — > 1~. This means that for any arbitrarily 
large constant C, there is a £o G M, £o < 1, such that W(s; £) > C for all £ > £o, £ — > 1 _ . Thus: 

where in the last step we used the identity J2 S P( s \ s o) = (1 — £) _1 > which is a direct consequence 
of the normalization condition J2 S Pn{s\so) = 1). Since C is arbitrarily large, it can certainly be 
chosen such that C > 1, and thus ([78]) will be contradicting d77j). 
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Let us write ([631 in the form: 



W(s;0=Y, a i 



1 + (d - c)P£ 



(79) 



+ (aa + + {ad - bc)af3£ 2 

Functions a and d can be expressed in terms of the first-passage time generating function by using 
relation (16): a = F(s\s'; £)P(s\s; £) = rjb, d = F(s'\s;£)P(s'\s';£) = rj'c, where we have 
introduced the notations rj = F{s\s';£) < 1 and rf = F(s'\s;^) < 1. In the £ — > 1~ limit, we 
obtain for 9(s, s', 1~) 



e(s,s',r 



1 - (1 - rf)cP 
1 + bar] + c(3r]' — 6c(l — rjr]')af3 



(80) 



The denominator can be written as: 1 + bar] + cfirf — bc(l — r]rf)af5 = [1 — (1 — r]')c/3] [1 + (1 + 
r])ba] — ba(l + cf3r)') +cf3(l + barj). Let u' = 1 — (l—rf)c/3 and u = l—{l — rf)ba. Since rj'c = d 
and rjb = a, we have u' = 1 + (3{d — c) and u = 1 + a(a — 6). Thus, from (73 74 1 it follows that 
«' > and u > 0. Moreover, we can write 1 — r]' = (1 — u')/ (c/3), 1 — r\ = (1 — u)/ (ba). Since 
1 — t/' > and 1 — 7] > (see above) and as both c/3 and ba are positive it follows that one must 
have: 

< u' < 1 , 0<u<l. (81) 



Using the notation, equation Eq. (80 1 is written as: 

9(s,s',l- 



' u'[l + (1 + r])ba] + c/3(l + 6ar/) - ba(l + c/3r/') ' 
In the denominator we then use 1 + c/3r/ = v! + c(3. After the cancelations we find: 



e( s ,s',i-) 



u 



11, 



u'(l + barf) + c/3(l + bar]) — bca(3 uu' + u'ba + uc/3 
The last equality was obtained after using 1 + bar] = u + ba. Hence: 

lv^ U' 

i / a ~r 



W(s;V 



u'a + mp(3 + uu' /6' 



(82) 



(83) 



(84) 



where ip = c/b = P(s'\s'; 1 )/P(s\s; 1 ). Taking into account (81 1, the terms in the sum are all 
positive and finite (even if b — > oo), that is there is C > constant such that: 



W(s;V 



<6 



(85) 



(Recall, that due to normalization ^ , a = p(s'\s) = 1.) This implies from (62i that in the 
limit £ — > (£ is very close to 1 but not quite 1): 



X(s 0] < E = CS(s ; 



(86) 
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Clearly, after the first loop was made by the walker (which happens after long enough times, 
otherwise we have X n = S n — 1 identically, in which case trivially \i = A), we have S n < X n 
and hence: 



showing that the leading order behavior for (X n ) and (S n ) are identical, i.e., \i = A. 

Discussion. Recall, that (S n ) ~ n x {(X n ) ~ n M ) means (S n ) ~ n x L(n) {(X n ) ~ n^F(n)) as 
n — > oo, where L(n),F(n) are slowly varying functions, that is L{Qx)/L(x) — > l(F(('x)/F(x) - 
1) when x — > oo, for any (, Q' > 0. 

i) When /* = A / 0, {S n ) and (X n ) obey the same scaling laws, however for early times, 
the corrections L(n) and F(n) can be dominant. This results in a slight deviation between 
the two curves, e.g., Sierpinski gasket, square lattice, as seen in the main article. For large 
graphs, in the n — > oo limit, these corrections diminish. 

ii) When \i = A = 0, the growth of (S n ) is characterized by the function L(n), while the 
growth of (X n ) is dictated by the F(n), which are not necessarily the same. This case is 
shown in Fig. ([7]) for the scale-free BA model. For simple random walks, (S n ) and (X n ) 
grow at the same rate, while for a walk that is biased towards high degree nodes, after a 
short linear growth region they slowly grow following different curves till they saturate. For 

the biased STP walk towards higher degrees we took p(s'\s) = k s (j2s"e{s) ^s"J where 
k s is the degree of node s and (s) denotes the set of graph neighbors of s. 



Figure 7: Comparison of the average number of discovered nodes (blue lines) and edges (red lines) 
on scale-free BA model (N = 25000, (k) = 10), for simple random walk (dashed lines) and for a 
walk that is biased towards high degree nodes (solid lines). 



S(s ;0<X(s ;0<CS(s 0] 0, as ^ 1 



(87) 




n 
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